Skip to content

Implement serde for CSV and Parquet FileSinkExec#8646

Merged
andygrove merged 8 commits intoapache:mainfrom
andygrove:serde-csv-parquet-sink
Dec 29, 2023
Merged

Implement serde for CSV and Parquet FileSinkExec#8646
andygrove merged 8 commits intoapache:mainfrom
andygrove:serde-csv-parquet-sink

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Dec 24, 2023

Which issue does this PR close?

Closes #8645

Rationale for this change

Needed by Ballista, so that we can support DataFrame:write_xxx again.

What changes are included in this PR?

Implement serde for CSV and Parquet FileSinkExec, based on existing support for JSON FileSinkExec.

Are these changes tested?

Yes, new tests added.

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Dec 24, 2023
@andygrove andygrove changed the title WIP: Implement serde for CSV and Parquet FileSinkExec Implement serde for CSV and Parquet FileSinkExec Dec 29, 2023
@andygrove andygrove marked this pull request as ready for review December 29, 2023 18:30
@andygrove
Copy link
Member Author

@devinjdangelo fyi, this is now ready for review

@@ -1220,20 +1222,22 @@ message ParquetWriterOptions {
}

message CsvWriterOptions {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have not released a version of DataFusion that contains CsvWriterOptions yet, so it is safe to change the field numbers here.

@devinjdangelo
Copy link
Contributor

This LGTM @andygrove! I am planning to attempt a more significant refactor in #8667 as discussed with @alamb and might be able to simplify adding support for some of the more advanced file writing options, including externally defined file types is the goal.

@andygrove
Copy link
Member Author

Thanks for the reviews @avantgardnerio and @devinjdangelo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add serde support for CSV and Parquet FileSinkExec nodes

3 participants